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Protein sequence alignment is essential for template-based protein structure prediction and function 
annotation. We collect 20 sequence alignment algorithms, 10 published and 10 newly developed, which 
cover all representative sequence- and profile-based alignment approaches. These algorithms are 
benchmarked on 538 non-redundant proteins for protein fold-recognition on a uniform template library. 
Results demonstrate dominant advantage of profile-profile based methods, which generate models with 
average TM-score 26.5% higher than sequence-profile methods and 49.8% higher than sequence-sequence 
alignment methods. There is no obvious difference in results between methods with profiles generated from 
PSI-BLAST PSSM matrix and hidden Markov models. Accuracy of profile-profile alignments can be further 
improved by 9.6% or 21.4% when predicted or native structure features are incorporated. Nevertheless, 
TM-scores from profile-profile methods including experimental structural features are still 37.1% lower 
than that from TM-align, demonstrating that the fold-recognition problem cannot be solved solely by 
improving accuracy of structure feature predictions. 



Template-based modeling (TBM) is by far the only reliable approach to protein 3D structure prediction''^. 
With rapid accumulation of experimental structures in the Protein Data Bank (PDB)\ TBM plays an 
increasingly important role in protein structure determination and structure-based function annotation 
studies as more and more protein structures become available as putative templates. In fact, recent studies showed 
that the current PDB library has already approached completeness in structural space*'^. Nevertheless, only 
around 2/3 of targets can have the templates reliably identified by current threading (or fold-recognition) 
methods in genome-wide protein structure prediction'' A critical issue for protein template identification is 
the correct construction and scoring of the target-to-template alignments of amino acid sequences. 

Early efforts on protein sequence alignments can be traced back to the 1970s when Needleman and Wunsch 
pioneered a global alignment algorithm for protein sequences via dynamic programming recursion'". Smith and 
Waterman extended the algorithm for identifying highly conserved subsequence motifs by local alignments". 
However, dynamic programming is too slow for scanning large-scale sequence databases. Altschul, Lipman and 
coworkers developed FASTA and BLAST based on a heuristic search and extension of common sequence patterns 
(words) among the compared sequences, which significantly increases the speed of sequence alignment and 
database search'^'''. Later, the authors extended BLAST to PSI-BLAST which improves the sensitivity of 
sequence-sequence alignments'"*. The key idea of PSI-BLAST is to generate multiple sequence alignments 
(MSAs) by iterative sequence database search, where a sequence profile in terms of a position-specific scoring 
matrix (PSSM) is constructed from the MSAs and used to enhance the accuracy of sequence alignment by 
sequence-profile comparisons. 

The idea of sequence profiles has revolutionized the sequence alignment search and template-based protein 
structure prediction'""'"". A variety of profile alignment based threading methods have been recently developed for 
efficient protein homologous template identification and structure prediction '^"^'; most of the methods rely on 
PSI-BLAST for MSA search and profile generations. The multiple sequence alignments and sequence profiles 
can also be created by hidden Markov models (HMMs), which are represented by a chain of match and insert/ 
deletion nodes with the MSAs corresponding to the paths with the highest probabilities given by the product of 
amino acid emission and insertion/deletion probabilities^^. Typical HMM-based threading algorithms include 
SAM^^ and HHsearch^", where SAM is based on HMM-sequence alignments and HHsearch on HMM-HMM 
alignments. 
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In addition to sequence profiles, a variety of structure features have 
been recently introduced to improve the alignment accuracy. For 
example, secondary structure predictions from neural network train- 
ings^ are used by almost all contemporary threading/alignment 
programs" "'^''. Other structural characteristics, including residue- 
residue contacts, backbone torsion anglesS*", solvation^' and residue 
depth", are often exploited in protein threading approaches"'^''^'*'^'^. 

Despite the extensive effort made in developing sequence align- 
ment algorithms, little is known about the relative performance of the 
methods. In particular, a number of critical questions remain to be 
addressed; for example, what are the quantitative differences of 
sequence- versus profile-based or local- versus global-alignment 
methods on close- and distant-homology detections? As the two 
most often-used approaches, what are the strength and weakness 
of PSSM- and HMM-based profiles in sequence alignments? How 
much can we possibly gain in fold- recognition by developing the best 
structural feature prediction methods? The answer to these questions 
is of essential importance for guiding the uses within the biology 
community, as well as for leveraging future method development 
studies in the field. 

As community-wide platforms, the CASP'"'", CAFASP'' and 
Livebench''' experiments provided valuable opportunities for critical 
assessments of various threading methods. However, one limit of the 
assessments is due to the fact that predictors in the experiments 
usually exploit different template libraries, the construction of which 
can have important impact on the final modeling results. Meanwhile, 
the number of targets involved in the experiments is limited (~ 100) 
and unbalanced; most of the targets are closely homologous to the 
experimental structures, which are easy to be detected^*. The pro- 
blems have been partly addressed by several of previous studies that 
compared different sequence alignment methods on large sets of 
benchmark proteins'^ For example. Park et al*' and Madera and 
Gough''s compared HMM- and PSSM-based profiles on the datasets 
collected from SCOP'" and found that HMM-based profiles can 
detect more homologous relationships than PSSM-based profiles. 
Dunbrack and coworkers''^" examined different sequence alignment 
tools using structure alignments as the gold standard and found that 
sequence-profile alignments by PSI-BLAST are only slightly more 
accurate than sequence-sequence alignments by BLAST but PSI- 
BLAST achieves much longer alignments. Girshin and coworkers'"' 
evaluated the alignment methods in multiple reference-dependent/ 
independent and global/local modes and showed that different 
aspects of evaluation reveal different properties of the methods. 
Barton and coworkers'"* developed a multiple-level benchmark suite 
to evaluate eight alignment methods and concluded that the majority 
of alignment improvements since 1985 were due to pair-score mat- 
rices rather than algorithmic refinements. Elofsson''" compared dif- 
ferent sequence alignment and threading algorithms and found that 
the alignment difference among different methods occurs mainly in 
the region of 15-20% sequence identity where secondary structure 
prediction and PSI-BLAST profiles are the major driven force of 
alignment improvements. 

Despite the valuable insights revealed, most of the benchmark 
studies focused on a limited set of traditional sequence alignment 
algorithms and were performed nearly a decade ago. Many recent 
developments, e.g. structural feature integrations and HMM-HMM 
alignments which are important for protein structure prediction, are 
yet to be assessed. Meanwhile, the testing datasets used in these 
studies were mostly collected from the SCOP library and largely 
belong to the easy homology category (which represents a similar 
problem in the CASP experiments mentioned above), while the per- 
formance of the methods on detecting hard distant-homology tem- 
plates, which are more challenging to the field, needs to be 
appropriately examined. 

In this work, we aim to develop a comprehensive and balanced 
experiment to systematically examine the strength and weakness of 



various up-to-date sequence alignment methods. Ten publicly avail- 
able methods and ten in-house methods specially designed for 
concept testing, which constitute a representative set of various align- 
ment/threading approaches, were installed on the local computer 
cluster. These methods are tested on a large set of 538 proteins con- 
sisting of a balanced category distribution of difficulty (i.e. including 
similar number of easy, medium and hard protein targets), based on 
a uniform set of template structure libraries. We conducted a detailed 
analysis on the benchmark results to address a series of critical ques- 
tions in sequence alignment and template-based protein structure 
prediction, which aim to provide insightful guidance for biological 
use and future method developments. All alignments and modeling 
data in this study can be downloaded at http://zhanglab.ccmb.me- 
d. umich.edu/publicdata/benchmarkl. 

Results 

Dataset and template library. All the sequence alignment programs 
are benchmarked on the same set of 538 non-redundant proteins 
randomly collected from the PDB library'. These proteins have a 
pair- wise sequence identity less than 30% and length ranging from 
34 to 804 residues. Proteins with broken chains or missing residues 
were not included. The sequences were divided into three categories: 
Easy, Medium and Hard targets, based on the consensus confidence 
score of the meta-threading LOMETS program"**, which consists of 9 
protein threading programs (dPPAS, MUSTER, HHsearch-I, 
HHsearch-II, PPAS, PROSPECT, SAM, SPARKS and SP3). A 
target is defined as Easy if at least one strong template hit can be 
detected for the target by each program with the Z-score higher than 
the confidence cutoff; a target is defined as Hard if none of the 
threading programs has a strong template hit; otherwise, it is 
considered a Medium target. In total, the 538 proteins are selected 
to include a balanced category distribution of difficulty with 137 
Easy, 177 Medium, and 224 Hard targets. Here, we have put more 
focus on the challenging targets by arbitrarily increasing the number 
of Medium and Hard proteins in our benchmark protein set, 
although a naturally collected sample from the PDB would have a 
much lower portion of Medium/Hard cases. A list of all the 538 
proteins, together with the classification, are provided in http:// 
zhanglab.ccmb.med.umich.edu/publicdata/benchmarkl/protein_ 
types.txt. 

The existence of template structures in the library is a precondition 
for template identification. To eliminate potential bias of the align- 
ment algorithms from the template structure library, we constructed 
the libraries of all threading programs using the same sequence iden- 
tity cutoff updated to the same time stamp (by Jan, 2013). In fact, the 
template libraries for NW-align, SW-align, BLAST, PSI-BLAST, 
PSA, PPA, PPAS, dPPAS, MUSTER, SAM, PRC, PROSPECT, 
SPARKS, SP3 and FFAS are generated from the same set of non- 
redundant PDB proteins with a pair-wise sequence identify cutoff 
70% (see http://zhanglab.ccmb.med.umich.edu/library/). The librar- 
ies for HHsearch-I and HHsearch-II are downloaded from ftp:// 
toolkit.lmb.uni-muenchen.de, which has also a sequence identity 
cutoff of 70%. The size of these two libraries is about the same. 
The programs of all the tested methods are described in METHODS. 

Summary of performance by individual alignment methods. 

Table 1 presents a summary of the 3D structural models, which are 
built by copying the framework of the highest ranked and the best in 
the top ten scoring templates based on the alignments generated by 
different alignment programs. The quality of alignments is generally 
measured by the root-mean-square deviation (RMSD) of the models 
(Columns 4-5), where BLAST, PSI-BLAST, PRC, FFAS and HH- 
search programs have the lowest RMSD to the targets (—7-9 A). 
However, the alignments by these programs tend to have a smaller 
number of residues aligned (i.e. lower alignment coverage. Columns 
6-7), typically below 80%. Such short alignments can have a negative 
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Table 1 | Summary of template identification by different alignment methods 
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nments 














MUSTER 


0.435(0.449) 


0.487(0.512) 


10.3(15.2) 


8.7(12.3) 


0.875 


0.875 


27.0 


HHsearch-ll 


0.429(0.449) 


0.477(0.507) 


9.5(21.6) 


9.0(14.1) 


0.767 


0.820 


13.0 


dPPAS 


0.426(0.438) 


0.481(0.502) 


9.6(20.6) 


8.5(15.3) 


0.819 


0.844 


17.0 


PPAS 


0.424(0.441) 


0.473(0.499) 


10.3(17.4) 


8.9(13.4) 


0.839 


0.850 


10.0 


SP3 


0.424(0.438) 


0.476(0.499) 


10.7(15.7) 


9.1(12.3) 


0.873 


0.873 


1 1.0 


HHsearch-l 


0.422(0.444) 


0.472(0.502) 


9.5(20.3) 


9.0(14.7) 


0.763 


0.817 


16.0 


SPARKS 


0.421(0.433) 


0.469(0.493) 


1 1.0(15.7) 


9.4(12.1) 


0.891 


0.886 


36.0 


PROSPECT 


0.418(0.428) 


0.469(0.490) 


1 1.5(13.3) 


9.9(1 1.3) 


0.914 


0.903 


15.0 


PPA 


0.397(0.413) 


0.447(0.469) 


10.9(17.5) 


9.7(15.5) 


0.844 


0.851 


25.0 


FFAS 


0.393(0.406) 


0.444(0.465) 


9.5(24.2) 


8.6(18.9) 


0.758 


0.790 


4.0 


PRC 


0.372(0.388) 


0.417(0.442) 


8.6(32.9) 


8.0(24.3) 


0.668 


0.712 


23.0 


Average 


0.415(0.429) 


0.465(0.5) 


10.2(19.5) 


9.0(17.7) 


0.818 


0.838 


17.9 


Sequence-to-profile alignments 














SAM 


0.344(0.358) 


0.405(0.426) 


10.6(27.5) 


9.9(1 8.3) 


0.717 


0.778 


8.0 


PSA 


0.338(0.333) 


0.371 (0.392) 


12.9(17.5) 


12.0(15.0) 


0.870 


0.873 


9.0 


PSI-BLAST 


0.301(0.320) 


0.344(0.369) 


7.8 51.7 


7.4 42.1 


0.507 


0.556 


4.0 


Average 


0.328(0.337) 


0.373(0.4) 


10.5(32.3) 


9.8(25.1) 


0.698 


0.736 


7.0 


Sequence-to-sequence alignments 














NW-align 


0.321(0.336) 


0.377(0.403) 


12.7(21.7) 


1 1.4(15.0) 


0.849 


0.866 


5.0 


SW-align 


0.265(0.285) 


0.324(0.348) 


9.9(49.5) 


9.2(35.7) 


0.560 


0.625 


4.0 


BLAST 


0.246(0.263) 


0.292(0.315) 


8.5(59.7) 


8.2(47.5) 


0.470 


0.529 


0.1 


Average 


0.277(0.295) 








0.626 


0.673 




Other controls 
















TM-align 


0.661(0.664) 


0.663(0.683) 


3.1(7.5) 


3.0(7.1) 


0.856 


0.846 


90.0 


MUSTER== * * 


0.482(0.51 1) 


0.512(0.559) 


8.0(14.1) 


7.2(1 1.2) 


0.797 


0.800 


26.0 


MUSTER== + 


0.453(0.481) 


0.493(0.536) 


9.5(12.7) 


8.1(1 1.3) 


0.831 


0.820 


26.0 


MUSTER=s 


0.447(0.474) 


0.487(0.528) 


9.7(12.7) 


8.3(1 1.3) 


0.839 


0.830 


26.0 


"Alignment methods as sorted by TM-score in each category. 
''Average TM-score. Values in parentheses are for full-length models 
score among the top ten models with the highest alignment scores. 
=RMSD to the native. 


built by MODELLER. 'First' 


refers to the top-ranking modt 


1 based on alignment score; ' 


SestintoplO'tothe 


best model of the 


highest 



^Alignment coverage equals to the number of aligned residues divided by target length. 

"Average CPU time in minutes, which consists of constructing profile and building of alignments in a hIP DLl OOOh computer. 



impact on the full-length structure construction by homology 
modeling since structure information is missed for a large portion 
of unaligned sequences. In fact, the fuU-length models by 
MODELLER-'^ have a very high RMSD (>20 A) for all these local 
alignment methods (see values in parentheses). Here, the full-length 
models ware generated by the script model-default.py in the 
MODELLER package. The modeling results from MODELLER are 
deterministic in the sense that more runs do not change the quality 
result of the final models. 

In Columns 2-3, we also list the result of the alignment models on 
TM-score, which is defined to combine the alignment accuracy and 
coverage as a unique score*"": 

(1) 



TM- 



- score = - - 
^ i = i 



1.24(1-15)^/^-1 



where L is the length of target sequence, L^n the number of aligned 
residues, and d, the pair- wise distance of rth residue in the model and 
target after the optimal superposition. In this scoring function, the 
programs, which have a better balance of alignment coverage and 
RMSD, excel, including MUSTER, HHsearch and PPAS programs. 
The simple sequence-to-sequence based alignment algorithms gen- 
erally have a lower TM-score. 

Meanwhile, the local-to-local alignment based algorithms gen- 
erally have a lower coverage and TM-score compared to the glo- 
bal-to-global alignment methods. A typical example is SW-align 
based on Smith-Waterman, which only identifies the highly con- 
served regions and has on average 56% residues aligned, while 



Needleman-Wunsch based NW-align uses the same parameter and 
scoring function but generates alignments with a much higher cov- 
erage (84.9%). Accordingly, the TM-score of NW-align is 21.1% 
higher than that of SW-align. The completeness of alignment search- 
ing also plays a role in final model determination. For instance, both 
BLAST and SW-align are local sequence alignments based on 
BLOSUM62 mutation scores. But BLAST searches are based on an 
incomplete heuristic word search algorithm, which has an average 
TM-score 7% lower than SW-align. BLAST is however 39 times 
faster than SW-align in our test. 

Although TM-score aims to balance the accuracy and coverage of 
alignments, it still favors algorithms that have a higher coverage, 
since including additional residues in the alignments always has a 
positive contribution to TM-score according to Eq. 1, although the 
contribution is small if the added residues from templates are far 
away from the target. To examine the effect of such bias, we con- 
structed fuU-length models of the targets based on the alignments, 
using the widely-used comparative modeling tool MODELLER''\ 
Although TM-score is now normalized by the same length of the 
target sequence, the TM-score ranking of full-length models is lar- 
gely consistent with that of the original alignments, except for some 
small but notable variations. Taking the top hits as an example, the 
original alignments by HHsearch-I have a lower TM-score than 
those by dPPAS (0.422 vs. 0.426) due to the low coverage of the 
sequences (76.3% vs. 81.9%). After full-length modeling, the TM- 
score of HHsearch-I becomes higher than that of dPPAS (0.444 vs. 
0.438) and several other related algorithms (e.g. PPAS and SP3). 
Here, the more precise alignments by HHsearch-I in the aligned 
regions have probably introduced some restraint/guidance to 
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Figure 1 | TM-score histogram of the top hits identified by different algorithms in Easy, Medium and Hard categories. 



the structure modeling of the unaUgned regions, e.g. through bond- 
length and change connectivity, which have resulted in models of a 
higher overall TM-score. 

Performance of sequence alignment programs in different target 
categories. The performance of different alignment programs varies 
with the difficulty of the targets, i.e. the evolutionary distance 
between target and template proteins. If we use the target structure 
as a probe to search through the PDB library by TM-align"", the 
average TM-scores of aligned regions of the best structural 
templates in the three categories of Easy, Medium and Hard are 
0.779, 0.666 and 0.586, respectively, after excluding homologous 
templates with a sequence identity > 30%. This data on one hand 
sets up an upper-bar for template identifications by fold- recognition; 
on the other hand, it demonstrates that the target category as defined 
by the LOMETS prediction is largely consistent with the actual 
difficulty of the template identification for the targets. 

In supplementary Tables SI, S2, and S3, we summarize the results 
of different programs on the Easy, Medium and Hard targets, 
respectively. Figure 1 is the histogram of the average TM-score 
achieved by different programs. As shown in Table SI, HHsearch 
programs generate the highest TM-score in the Easy targets. 
MUSTER and other structure-assisted alignment methods (dPPAS, 
SP3 etc) generally outperform the HHsearch programs in the 
Medium and Hard targets. This data demonstrates the usefulness 
of structure-based features in detecting the distant homologous tem- 
plates. 

Specificity of alignment programs. Except for the accuracy of the 
template alignments (or sensitivity), the specificity of the alignments 
(i.e. the correlation of the scoring function and the accuracy of the 
final alignments) is another important measurement of the 
alignment algorithms, as this correlation essentially decides how 
the results can be used in the comparative structural modeling and 
function annotations. 

In Figure 2, we present the TM-score data of the highest ranked 
alignment models versus the alignment scores by the 20 alignment 
programs. Here, we tried both the default alignment score of the 
programs and the Z-score (defined as the difference between the 
raw alignment score and average in units of standard deviation), 
and chose the one with the highest correlation to the TM-score of 
the final models to present in the plot. As expected, positive correla- 
tions are observed for all the alignment programs, with PPAS, SAM 



and MUSTER having the highest Pearson correlation coefficients 
(0.789, 0.787, and 0.782, respectively). The NW-align and BLAST 
programs have the lowest correlation coefficient because a number of 
targets have a high alignment score but with low quality (TM-score 
< 0.5), indicating a low specificity of the programs. 

We also mark in Figure 2 an alignment score cut-off that mini- 
mizes the false positive rate, FPR = FP/(FP + TN), and the false 
negative rate, FNR = FN/(TP + FN), where a model of TM-score > 
0.5 is defined as a positive hit that has the correct foW". The score 
cut-offs, false positive and false negative rates are listed in Table 2. 
The programs with alignment score that are calibrated by the stat- 
istics of random samples, including PSI-BLAST, SAM and FFAS, 
have the lowest FPR + FNR values, i.e. the highest specificity, based 
on this measurement. Meanwhile, the Easy and Hard targets are 
clearly grouped in the right-up and left-bottom regions in Figure 2 
for all programs, demonstrating the dependence of the performance 
of the alignment algorithms on the evolutionary distance of target 
and templates. 

Profile-based alignments versus sequence-based alignments. De- 
pending on whether the homologous sequences are included in the 
target- template alignments, the alignment methods can be grouped 
into the three general categories of sequence-to-sequence alignment 
(including NW-align, SW-align and BLAST), sequence-to-profile 
alignment (PSI-BLAST, SAM and PSA), and profile-to-profile 
alignment (PRC, HHsearch-I, HHsearch-II, PPA, PPAS, dPPAS, 
MUSTER, PROSPECT, SPARKS, SP3 and FFAS). Since the sequ- 
ence profiles derived from multiple sequence alignment of protein 
families contain important information of conserved/diverged loca- 
tions along the sequences, the profile-based alignments can generally 
generate more accurate target-template alignments than that made 
by single sequence-based alignments""-'". 

Such insight is also observed in our data analysis. As shown in 
Table 1 (rows highlighted in bold), the average TM-score obtained by 
the sequence-to-profile based methods is 18.4% higher than the TM- 
score from the sequence-to-sequence based methods. Similarly, the 
TM-score from profile-to-profile alignment methods is 49.8% higher 
than that of sequence-to-sequence based methods. These increases in 
TM-score are not only due to the higher coverage of alignments 
(81.8% vs. 62.6%), but also the enhanced accuracy of alignments as 
the average RMSD is reduced slightly in the profile-profile methods 
from 10.4 to 10.2 A. This tendency is also seen in Tables Sl-3 where 
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Figure 2 | TM-score of full-length models of 20 threading methods on 538 non-homologous proteins versus the alignment scores. Easy, Medium and 
Hard targets are colored blue, green and red, respectively. PSI-BLAST, BLAST and PRC use bit score and others use z-score to score the alignments. 



the targets were categorized into different groups of Easy, Medium, 
and Hard, demonstrating that the profile-based alignments enhance 
both close and distant homology identifications. 

Two types of sequence profiles, PSSM and HMM, are often 
exploited in various alignment methods. Given a MSA, the PSSM 
profile is designed to account for the estimated frequency of amino 
acids at each position, while the HMM profile accounts for both 
amino acid frequency and position-specific probabilities for inser- 
tion and deletion. Although the HMMs seem to contain additional 
gap information from MSAs, there is no obvious difference between 
the HMM-based (e.g. HHsearch-I and -II) and PSSM-based align- 
ment algorithms (e.g. PPAS), in terms of the TM-score of the align- 
ment models (Table 1). However, HMM-based methods did generate 



higher TM-scores than PSSM-based methods in the Easy targets 
(Table SI). Additionally, the HMM-based methods have generally 
a lower RMSD and lower coverage of alignments, indicating that the 
HMM method is more sensitive in detecting local structural motifs 
and scaffolds. 

Meanwhile, there are a number of targets that have the correct 
templates identified by either HMM- or PSSM-based methods (but 
not both), demonstrating that these two types of methods are com- 
plementary to each other. This complementarity from multiple 
alignment algorithms is essential to the success of meta-server based 
structure modeling approaches'"'^". 

How much space is left for improvement by structural feature 
prediction? The performance of profile alignments could be 
further improved by incorporating structural information. One 
example is secondary structure comparison, which has been used 
by almost all contemporary alignment/threading methods to guide 
the target- template alignments. As a quantitative test of the impact of 
secondary structure information on alignment accuracy, we 
developed two sequence profile-profile based methods, PPA and 
PPAS, where the only difference is that PPAS contains a secondary 
structure match in the scoring function but PPA doesn't (see Eqs. 3 
and 4 in METHODS). As a result, the inclusion of secondary 
structure prediction increases the TM-score of PPA by 6.8%. 
MUSTER is another typical profile-profile alignment based 
algorithm that incorporates multiple composite structure features 
in the alignments, including secondary structure, residue depth, 
solvent accessibility, and backbone torsion angle predictions. These 
features result in a TM-score increase of 9.6% compared to the PPA 
method, corresponding to a p-value < 10 in paired student t-test. 

The performance of the structure feature assisted algorithms relies 
on the accuracy of structure feature predictions for the target 
sequence. In our test on the 538 non-homologous proteins, the aver- 
age Q3 score (three structure states per residue overall accuracy) for 
PSSpred and PSI-pred is 83.1% and 80.7%, respectively; the mean 
absolute errors in \\i and (p angle predictions are 28^ and 41°, respect- 
ively; and the Pearson correlation coefficient correlation between 



Table 2 | Score cutoffs and false positive and negative rates of dif- 
ferent programs 



Methods* 


Cutoff 


FPR 


FNR 


FPR + FNR 


PSI-BLAST 


50.4 


0.093 


0.094 


0.187 


SAM 


14.5 


0.129 


0.099 


0.229 


FFAS 


12.9 


0.170 


0.100 


0.270 


PPA 


7.8 


0.126 


0.158 


0.284 


SP3 


6.5 


0.117 


0.175 


0.292 


SPARKS 


6.4 


0.1 1 1 


0.194 


0.306 


SW-align 


8.0 


0.162 


0.149 


0.31 1 


PRC 


20.8 


0.1 10 


0.205 


0.316 


BLAST 


35.7 


0.149 


0.169 


0.318 


PPAS 


6.9 


0.186 


0.145 


0.331 


dPPAS 


13.2 


0.1 13 


0.233 


0.346 


HHsearch-I 


8.1 


0.172 


0.179 


0.351 


MUSTER 


6.2 


0.147 


0.205 


0.353 


HHsearch-II 


9.3 


0.16 


0.200 


0.360 


PROSPECT 


4.2 


0.075 


0.317 


0.392 


PSA 


4.1 


0.128 


0.400 


0.528 


NW-align 


1.5 


0.464 


0.160 


0.625 



*Methods sorted by sum of false positive rote (FPR) and false negative rote (FNR). 
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predicted and actual solvent accessibility is 0.678. Incorrect assign- 
ments of the structure features can compromise the performance of 
MUSTER. In fact, we observed a number of cases where the TM- 
score of the alignments by MUSTER, which considers additional 
structural features, is lower than that of PPAS. 

In order to explore the potential of the alignment improvement 
obtained by considering structural features, we implemented 
MUSTER using the native structure features derived from the target 
structures, where the weighting parameters were re-optimized in a 
separate test of 100 proteins. As shown in Table 1, the average TM- 
score of the full-length models from MUSTER alignments showed a 
gradual increase from 0.449 to 0.511, when we exploited more 
native structure features from secondary structures (MUSTER^^), 
backbone torsion angle (MUSTER^^*®™), and solvent accessibility 
(MUSTER^^*®™*^^). This change corresponds to an overall increase 
of 13.8% in the average TM-score. 

In Figure 3, we present an illustrative example from the PP7 bac- 
teriophage coat protein (PDB ID: 2qudA), which has the secondary 
structures arranged as pi-P2-P3-P4-P5-P6-al-cx2 from the N- to C- 
terminals. The PSI-pred method has however mis-assigned most 
secondary structure elements in 12S-90M (see '*' in Figure 3A), 
which resulted in the first two beta-strands (7T-11S and 15A-25T) 
in the target mis-aligned to the coiled regions in the template IqbeB 
by MUSTER. This alignment has a TM-score = 0.607. When using 
the actual secondary structure assignment, the MUSTER^^ program 
correctly matches the two beta strands of the target with the strands 



on the template. Based on the same template, the correction of the 
secondary structure comparison increases the TM-score of the 
model to 0.7 in this example. 

Despite the significant increase in alignment accuracy brought by 
the integration of structure features, the quality of the alignments 
using the best structure features from the native is still far from the 
best templates detected by structural alignments, i.e. the average TM- 
score by TM-align is 37.1% higher than that by MUSTER'^ + "'^'^ + 
(Table 1), which demonstrates considerable room for further align- 
ment improvement. The gap is relatively small in Easy targets (7.4%) 
according to the data in Table SI, which indicates that the current 
state-of-the-art alignment methods generate nearly optimal align- 
ments for close homology targets. But for the Medium and Hard 
targets, the gaps become highly significant, which correspond to a 
TM-score difference of 35.6% and 79.2%, respectively (Tables S2 and 
S3). Apparently, such gaps cannot be filled by solely improving the 
structure feature prediction methods, and a completely different 
alignment system based on novel scoring and alignment schemes 
might be required. 

Discussion 

We developed a comprehensive experiment to systematically exam- 
ine the strength and weakness of 20 representative sequence align- 
ment methods for template-based protein structure prediction. The 
data analysis demonstrates the dominant advantage of profile-profile 
based alignment methods in protein template identification, which 



N terminal 



(A) 




N terminal 



(B) 



al 




C terminal 



TM-score=0.607 RMSD=6.2 



obs 

pred 



CCCCC 



ccccc ;eeeee-ec-ceeeeee- 



EEEEE-CH-KHKKKKK- 



-EEEEECCE- 
-HHHHHHHH- 



2qudA 
IqbeB 



-EEEEEC CCCCCCCC 
-HKHHHJ CCCCEEEE 

GGSMSKTIVLS-VG-EATRTLT EIQSTADR — QIFEEKVGPLVGRL 42 

I I I I I [ I 

-AKIE-TVTLGNIGKDGKQTLVLNPRGVNPTNG-VASLSQAGAVP-ALEK, 46 
-CCCC-CEEEECCCCCCCCEEEEEEEEEECCCC-EEEEECCCCCC-CCCC 



obs EEEEEEEE — CCCCCEEEEEEEEEE — EEEECCCC-CEEEEEEEEEEEEE 

pred EECHHHHH — CCCCCEEEEEEECCC — CCCCCCCC-CCEEEEEEECCCEE 

2qudA 4 3 RLTASLRQ — NGAKTAYRVNLKLDQ — ADWDSGL-PKVRYTQVWSHDVT 

i 1 I I I I I I i II 

IqbeB 47 RVTVSVSQPSRNRKN-YKVQVKIQNPTAGTANGSCDPSYTROAYADVTFS 

obs EEEEEEECCCCCCCC-EEEEEEEEEEEEECCCCCCCCEEEEEEEEEEEEE 




C terminal 



TM-score=0.700 RMSD=4.7 



obs CCCCCCEEEEE-EC-CEEEEEEEEEEECC EEEEEECCCCCCCCCE 

2qudA 1 GGSMSKTIVIS-VG-EATRTLTEIQSTAD RQIFEEKVGPLVGRIR 43 

I I I I I II 

IqbeB 1 -AKLE-TVTLGNIGKDGKQTLVUIPRGVNPTKGVASLSORGAVP-ALEKR 47 

obs -CCCC-CEEEECCCCCCCCEEEEEEEEEECCCCEEEEECCCCCC-CCCCE 

obs EEEEEEE — CCCCCEEEEEEEEEEEEEECCCC CEEEEEEEEEEEEE 

2 qudA 4 4 LTASLRQ — NGAKTAYRVNLKIDQADWDSGL PKVRYTQVWSHDVT 8 7 

III I I I I II 

IqbeB 48 VTVSVSQPSRNRKN-YKVQ¥KIQNPTACTAK~GSCDPSVTRQfiYADVTFS 35 

obs EEEEEECCCCCCCC-EEEEEEEEEEEEECCC-CCCCCEEEEEEEEEEEEE 

obs EECCCCHHHHHHHHHHHHHHHHCHHHHHHHHHCCCCCC 
2qudA 88 IVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGR 125 

III II I II 

IqbeB 96 FTQYSTDEERAFVRTELAALLASPLLIDAIDQLNPAY- 132 

obs ECCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCC - 



obs EECCCCHHHHHHHHHHHHHHHHCHHHHHHHHHCCCCCC 
pred EEECCCHHHHHHHHHHHHHHHHHHHHHHHEEEEEECCC 

2qudA 88 IVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGR 125 

III II I II 

IqbeB 96 FTQYSTDEERAFVRTELAALLASPLLIDAIDQLNPAY- 132 

obs EC C C C CHHHHHJiHHHHHHHHHHHKHHHHHHHHC CC C C - 



Figure 3 | The illustration of template identifications for 2qudA. (A) MUSTER with predicted secondary structure; (B) MUSTER^^ with native 
secondary structure. The experimental structure and MUSTER models are shown in red and green cartoons, respectively, and the first two beta-strands in 
(A) are in yellow on the template. The secondary structures are labeled as 'C for coil, 'E' for strand and 'H' for helix, where 'Pred' and 'Obs' denotes the 
PSI-pred prediction and the native, respectively. in (A) marks the residues with mis-predicted secondary structure. 
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generates structural models with an average TM-score 26.5% higher 
than sequence-profile alignments, and 49.8% higher than single 
sequence-sequence alignments. The superiority of profile-based 
alignments over sequence-based alignments was also observed in 
previous benchmark studies'''''^'. 

The sequence profiles are typically constructed by PSI-BLAST and 
hidden Markov model (HMM) searches, where the former is usually 
specified by a position-specific scoring matrix (PSSM) and the latter 
by a trained chain model of matches and insertions/deletions. Our 
data analysis showed that there is no obvious difference between 
PSSM and HMM profiles in terms of overall average TM-score, 
although the HMM based alignments tend to obtain higher TM- 
scores in Easy targets and generate alignments of higher accuracy 
but with lower alignment coverage. This data seems in contradiction 
to the results by Park et al*' and Madera and Gough''^ who concluded 
that the HMM based methods consistently outperform PSI-BLAST. 
We believe that the major reason for the seeming contradiction is due 
to the difference in sample preparations. In our testing dataset, we 
intentionally included more medium and hard targets to keep a 
balanced category distribution in difficulty and all templates with a 
sequence identity > 30% to the target were excluded. In the experi- 
ments by Park et al and Madera and Gough, however, the authors 
collected large-scale proteins from SCOP without intention to 
include more hard proteins. In addition, the authors used a sequence 
identity cutoff 40% for template filtering, which includes homolog- 
ous templates with a sequence identity in 30-40% that are easy to 
detect by most threading methods. Therefore, it is anticipated that 
most of the test proteins in these two studies should correspond to 
Easy targets in our categorization and their conclusion on the HMM 
and PSSM profile comparisons is in fact consistent with our analysis 
on the Easy proteins (Table SI). 

The profile-based sequence alignments can be considerably 
improved by the combination of structure feature predictions. For 
example, the program of MUSTER, which combines profile align- 
ments with sequence-based secondary structure, residue depth, tor- 
sion angle and solvation predictions, has a 9.6% higher TM-score on 
average when compared to the profile-profile alignment algorithms. 
The performance of structure-assisted methods relies on the accu- 
racy of the sequence-based structure feature predictions, which can 
be further improved by nearly 10.8% (or 13.8% in full-length models) 
if the native structure features extracted from experimental struc- 
tures are exploited. Nevertheless, the latter is still far worse from the 
best possible templates as identified by structural alignment program 
TM-align, which uses the target structure as a probe to generate the 
optimal alignments. In the Easy, Medium and Hard categories, the 
TM-score by TM-align is 7.4%, 35.6%, and 79.2% higher than that of 
MUSTER'^' + "^'^ + While fUling such a big gap is one of the most 
urgent goals in template-based protein structure prediction, it appar- 
ently cannot be achieved solely by the improvement of structure 
feature prediction methods. New algorithms with completely novel 
scoring functions and alignment search engines are probably needed 
to attack the central problem of sequence alignment, which is essen- 
tial to template-based protein structure prediction and function 
annotations. 

Methods 

Twenty threading/alignment methods, covering different categories of target-tem- 
plate alignment algorithms and possible to install at local computers, are bench- 
marked in this article. All algorithms without cited references are newly developed in 
house and first presented in this study. 

1. NW-align. NW-align is a sequence-to-sequence alignment program constructed 
based on the standard Needleman-Wunsch dynamic programming algorithm^". The 
amino acid mutation matrix is from BLOSUM62^^ with gap opening penalty — — 11 
and gap extension penalty — —1. 

2. SW-align. SW-align is a sequence-to-sequence alignment program using a similar 
setting as NW-align but with dynamic programming based on the standard Smith- 
Waterman algorithm^"". The major difference from NW-align is that the negative 



score values are set to zero and the alignment trace-back procedure starts from the 
highest scoring cell and ends with a cell of zero score in SW-align. This setting allows 
SW-align to identify subsequence motifs having the highest local sequence sunilarity. 
The source codes and the executables of both NW-align and SW-align are available at 
http;//zhanglab.ccmb.med. umich.edu/NW-align. 

3. BLAST. BLAST^^ is a local sequence alignment tool based on a heuristic searching 
method, where high-scoring segment pairs {HSPs, or words) are first identified by 
gapless comparisons. The final alignments are constructed by extension and 
connection of the HSP regions. The heuristic algorithm in BLAST is often suboptimal 
but much faster than the optimal dynamic programming algorithms. 

4. PSLBLAST. PSI-BLAST^* is a sequence-to-profile alignment program extended 
from BLAST which aims to increase the alignment sensitivity of distant homologous 
proteins by iterative MSA search. It first collects a list of close homologous sequences 
from a reference database (e.g. NCBI non-redundant sequence database, NR) by 
BLAST. A PSSM is then derived from the MSA of the sequence homologies, which is 
used to search against the reference database again to identify a newer set of 
homologous sequences. The procedure can be repeated a number of times until the 
PSSM profiles converge. In our test, PSI-BLAST was searched against NR database for 
3 iterations using an E- value cutoff, which assesses the significance of the HSP score, 
below 0.001. 

5. PSA. PSA is sequence-to-profile alignment algorithm based on the Needleman- 
Wunsch dynamic programming. The scoring function of the ith position in the query 
(q) aligned with the jth position in the template {t) is 

20 

ScorepsA(;,;)= * B,(k,j) + shift (2) 

k = l 

where F^{i, k) represents the frequency profile of fcth amino acid at ith position of the 
query. Bt(k, j) denotes a BLOSUM mutation score between the amino acid k andy'th 
residue of the template. The shift parameter is introduced to avoid the alignment of 
unrelated residues in the local regions. Parameters of shift ( — 0.01), gap opening 
{go, — 8.6) and gap extension [ge, — 0.9) penalties were optimized on the ProSup 
dataset'*. 

6. PPA. PPA is an in-house profile-profile alignment method on the Needleman- 
Wunsch algorithm. The scoring function is defined by 

20 

Scor<;ppA(!,;)= X^f,(U) * L,(k,f) + shift (3) 

k=l 

where Fq(/, k) and Lf(j, k) stand for the sequence frequency profile of query and the 
log-odds profile of template, respectively. To build the sequence profiles, the 
sequences are searched against the NR by 3 PSI-BLAST iterations, at an E- value cutoff 
0.001. The Henikoff weighting scheme'^ is then used to generate frequency or log- 
odds profiles. Similarly, the parameters oi shift { — Q. 9 A), go ( — 6.8), and^e( — 0.52), are 
optimized by trial and error using the ProSup dataset. 

7. PPAS. PPAS is an in-house profile-profile alignment method that combines profile 
log-odds score and secondary structure comparison. The scoring function is defined 
by 

ScoreppAsiiJ) ^ ScoreppA{i,j) + CiS(S^{i),St(j)) (4) 

where Scorepp^ii, j) is defined in Eq. 3, 3{Sc,{i), St{j)) is the Kronecker delta function to 
assess the secondary structure match between target and template. S^{i) is the 
secondary structure of the ith residue on the target predicted by PSSpred 
(http://zhanglab.ccmb.med.umich.edu/PSSpred), and S((/) is the secondary structure 
of the jth residue on the template structure assigned by DSSP. A position-specific gap 
penalty scheme is used in the alignment search, i.e. no gap is allowed inside the 
secondary structure regions, go and^e penalties apply to other regions, and the ending 
gap-penalty is neglected. The four parameters Cj (0.65), shift ( — 0.96), go ( — 7.0), and 
ge ( — 0.54), are optimized for PPAS in a similar way as PPA. 

8. dPPAS. dPPAS is an in-house profile-profile alignment program extended from 
PPAS. The only difference from PPAS is that a structure fragment depth profile is 
added in dPPAS to enhance the alignments, i.e. 

20 

Score^PPAsiuj)- — +Ci3(S^{i).S,(j)) +shift (5) 

where Fq{i, k) and L;(/, k) are defined in Eq. 3. L^^J^, k) is a frequency depth profile 
derived from a set of structural fragments that have similar depth as the fragment at 
jth position of the template Similarly, the parameters (Cj — 6.5, shift — —0.96, go 
— —7.0 andge — —0.54) are optimized on the ProSup dataset. 

9. MUSTER. MUSTER^^ is a profile- profile based threading program which 
combines multiple sequence and structure matching information. In addition to the 
sequence profiles obtained by PSI-BLAST searches, the scoring function contains 
secondary structure match (SS), fragment depth profiles, solvent accessibility (SA), 
backbone torsion angles (BTA), and hydrophobic scoring matrix. The optimal 
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alignment is generated by Needleman- Wunsch dynamic programming. Compared to 
dPPAS, MUSTER contains additional terms from SA, BTA, and hydrophobic scoring 
matrix matches, whereby the weighting parameters are re-trained by a new dataset. 

To further examine the potential of structure-assisted threading algorithms, we 
developed three variants of MUSTER programs, MUSTER'', MUSTER'' ^ and 
MUSTER" + ^''^^ + '^ which exploit the SS, BTA, and SA features extracted from the 
experimental structures of the target. Similar to MUSTER, all parameters in these 
algorithms are optimized in a separate training set of 100 non-redundant proteins by 
maximizing the TM- score. 

10. SAM. SAM^^ is a hidden Markov model (HMM) based protein fold -recognition 
method. Starting from the PSI-BLAST search, SAM constructs a HMM profile based 
on the iterative MSA searches. The HMM profile is then used to search through the 
PDB library to identify structural templates. SAM can conduct both local and global 
alignment searches and we use the local alignment mode in this work. 

11. PRC. PRC^^ is a program for scoring and aligning profile HMMs. To run PRC, we 
first construct HMMs of both target and template sequences by SAM^^. The HMM- 
HMM based alignments are then computed and ranked by PRC which is designed to 
find the Viterbi path that maximizes the sum of forward-backward odds scores. 

12. HHsearch-I and HHsearch-II. HHsearch^* is a HMM-HMM based alignment 
program which combines the profile log-odds score and the secondary structure 
prediction in the Viterbi dynamic programming. We run two versions of HHsearch: 
HHsearch-I uses PSI-BLAST to start the MSA search for building the profile HMMs 
for target and template sequences, while HHsearch-II uses HHblits to construct the 
profile HMM for target sequences. HHblits uses a discretized-profile prefilter that can 
generate HMM profiles faster than PSI-BLAST^^. The final query- template 
alignments are constructed by the same HHsearch program. Both HHsearch-I and 
HHsearch-II are in the local alignment mode. 

13. PROSPECT. PROSPECT^^ is a sequence profde-profde alignment algorithm 
assisted with a residue-level contact potential and SS predictions. A global 
optimization of target -template alignment is generated by the divide- and- conquer 
searching method. 

14. SPARKS and SP3. Both SPARKS'' and SP3'' were developed in Zhou Lab. In 
SPARKS, the authors exploit a sequence profde-profile alignment combined with a 
single-body statistical potential; in SP3, they use a residue depth dependent structure 
profile to replace the single-body potential used in SPARKS. 

15. FFAS. FFAS^^ is a sequence profile -profile based alignment program. It calculates 
the sequence profile by PSI-BLAST searching against the NR85s database with 5 
iterations. A dot-product scoring function is then used to align two sequence profiles. 
The alignment score is finally translated into a statistical measure by comparing it 
with the distribution of scores obtained for pairs of unrelated proteins. 
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